DOMAIN: Electronics and Telecommunication

CONTEXT: A communications equipment manufacturing company has a product which is responsible for emitting informative signals. Company wants to build a machine learning model which can help the company to predict the equipment’s signal quality using various parameters.

DATA DESCRIPTION: The data set contains information on various signal tests performed:

1. Parameters: Various measurable signal parameters.
2. Signal_Quality: Final signal strength or quality

PROJECT OBJECTIVE: The need is to build a regressor which can use these parameters to determine the signal strength or quality [as number]. Steps and tasks:

1. Import data.
2. Data analysis & visualisation
    • Perform relevant and detailed statistical analysis on the data.
    • Perform relevant and detailed uni, bi and multi variate analysis.

Hint: Use your best analytical approach. Even you can mix match columns to create new columns which can be used for better analysis. Create your own features if required. Be highly experimental and analytical here to find relevant hidden patterns. 

3. Design, train, tune and test a neural network regressor. 
Hint: Use best approach to refine and tune the data or the model. Be highly experimental here.

4. Pickle the model for future use

Task 1 : Import data.

Task 2. Data analysis & visualisation

• Perform relevant and detailed statistical analysis on the data.
• Perform relevant and detailed uni, bi and multi variate analysis.
Reading sample data
Checking data types and other information
View some basic statistical details like percentile, mean, std etc.
Parameter 1:
     mean and median value are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers. Need to check modality of the data.
Parameter 2:
    mean and median values are close by, so the data seems to be normally distributed. There could be few outiers, we need to check data using boxplot. 
Parameter 3:
    mean and median values are close by, so the data seems to be normally distributed. There could be few outiers, we need to check data using boxplot.
Parameter 4:
    mean and median values are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers.
Parameter 5:
    mean and median values are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers.
Parameter 6:
    There is considerable difference between 75% and max, so there could be outiers. Need to check the distribution of data in kde plot.
Parameter 7:
    There is considerable difference between 75% and max, so there could be outiers. Need to check the distribution of data in kde plot.
Parameter 8:
     mean and median value are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers.
Parameter 9:
     mean and median value are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers. 
Parameter 10:
     mean and median value are close by, so the data seems to be normally distributed. There is considerable difference between 75% and max, so there could be outiers.
Above table would give insight into data about distribution of mean and median for each column.
There are no null values observed in the above cell output, so no need to null value correction.
Univariate data analysis is done below.
As seen in above diagram, most columns are having normal distribution of data. There are modality obsereved in parameter 3 and parameter 2 column.
Long tail observed for Parameter 4, Parameter 5,Parameter 6, Parameter 7 and Parameter 10 and can be considered as right skewed data distribution. For these column we need to check outliers and treat them as required.
Viewing data in boxplot would help to identify the outliers.
Since there are outliers observed on many column, let's see the data in detail box plot for individual columns.
There are outliers observed in every column which are outiers of 1.5*IQR. We shall run our final model on both outlier treated data and the one without treating it and observe the output. 

Azme Khamis, Zuhaimy Ismail , Khalid Haron and Ahmad Tarmizi Mohammed , 2005. The Effects of Outliers Data on Neural Network Performance. Journal of Applied Sciences, 5: 1394-1398.

DOI: 10.3923/jas.2005.1394.1398
Percentage of outliers are very less, so we observe that these may not affect the output of Neural network
As observed the correlation between any 2 variable is not strong (here more than 85 % can be considered as strong), so there is no need to delete any columns.
Observing above graphs, we can see how the density distribution of the data. Paremeter 4,5,6 have denser concentration of data. These columns could be drivers of the regression. As we see in kde, the data points seems to give good variety of the data.
In above graphs, we can observe that the data distribution is having wide range for signal strength - 5,6,7.
For other 3 types of signal strength, the data distribution is slim. 
Although this is a regression problem and Signal strength is considered as a continous variable, there are only few values for signal strength. These signal strength if we consider them as a categorical column for time being, the above graph gives some insight into data.
Making copy of the raw file, which will be used for netwok training.
As we can observe, the outliers are reduced in the etRawFileOutliersRemoved dataset. We shall use this in the network to see if outlier are affecting the performance.
Preparing dataset for training the network
Design, train, tune and test a neural network regressor.
Training a simple network to see if we are able to find the proper regression, we shall observe the results and decide further development of the network.
In above model, the loss seems to be more, so we need to tune the model.

Lets change the network and add 1 more layer and change the loss function to see if we are seeing any imporvement in the results. Below cell has code with Outlier dataset

The above graph gives good insight into data, it shows that it is a good fit network.
A good fit exists between an overfit and underfit model. There is minimal gap between the two final loss values.
There is slight  generalization gap observed in the above graph

Below cell has code with Outlier removed dataset

4. Pickle the model for future use
The above graph gives good insight into data, it shows that it is a good fit network.
A good fit exists between an overfit and underfit model. There is minimal gap between the two final loss values.
There is slight  generalization gap observed in the above graph

calculating the score to evaluate the model
Using earlier picked model to evaluate.
Lets check the difference in output of the test dataset so we can conclude on the performance of the trained model

Observation:

DOMAIN: Autonomous Vehicles

BUSINESS CONTEXT: A Recognising multi-digit numbers in photographs captured at street level is an important component of modern-day map making. A classic example of a corpus of such street-level photographs is Google’s Street View imagery composed of hundreds of millions of geo-located 360-degree panoramic images. The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents. More broadly, recognising numbers in photographs is a problem of interest to the optical character recognition community. While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large range of fonts, colours, styles, orientations, and character arrangements. The recognition problem is further complicated by environmental factors such as lighting, shadows, specularity, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs. In this project, we will use the dataset with images centred around a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler, it is more complex than MNIST because of the distractors.

DATA DESCRIPTION: The SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with the minimal requirement on data formatting but comes from a significantly harder, unsolved, real-world problem (recognising digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images. Where the labels for each of this image are the prominent number in that image i.e. 2,6,7 and 4 respectively.The dataset has been provided in the form of h5py files

PROJECT OBJECTIVE: We will build a digit classifier on the SVHN (Street View Housing Number) dataset. Steps and tasks:

    1. Import the data.
    2. Data pre-processing and visualisation.
    3. Design, train, tune and test a neural network image classifier. 
        Hint: Use best approach to refine and tune the data or the model. Be highly experimental here to get the best accuracy out of the model.
    4. Plot the training loss, validation loss vs number of epochs and training accuracy, validation accuracy vs number of epochs plot and write your observations on the same

Import the data.

Data pre-processing and visualisation.

Design, train, tune and test a neural network image classifier.

Basic NN Model
The accuracy is not improving after few epochs, Other parameters can be changed to see if there is improvement in accuracy. Other neural network has to be checked for given data to improvise the accuracy. Several parameters needs to be changed as well.
The same is observed in the above cell, wherein the accuracy is not improving. Kernel initialization can be implemented which could aid in improving accuracy.

Adding kernel initialization with relu activation.

he_normal initialization works better for layers with ReLu activation.
In above cell, adam optimizer has been used with learning rate of 0.001. Learning rate of 0.0001 with same beta values are tested and the learning is too slow and we got half the accuracy with same number of epochs. So it is better to have a balanced learning rate and the learning rate affects how quickly our model can converge to a local minima, arrive at best accuracy. It's is required to observe the Val_accuracy and accuracy.
In above cell, we are checking with different optimizers, sgd and Adagrad are providing good accuracy score. We can have adam as optimizer. For SGD and Adagrad, other parameters can be changed and checked, but observing score it is better to have adam optimizer.
The plot is for the network with name keras_model_IRA. This model is having 4 hidden layer and relu activation with he_normal kernel initialization.
There is no improvement in accuracy and the model needs lot of improvisation, either by adding layer or by tuning parameters !!!
This represents some underfit model as the gap between 2 losses are too much, This model needs improvisation.

adding batch normalization in between layers with adagrad optimizer

In above model we have used Adagrad optimizer with batch normalization, the network is trained with learning rate of 0.001. For each epoch, the model is taking around 40 seconds. We can use other optimizers and check if this can be improved. Early stopping has been introduced to avoid overfitting and stop the model at appropriate training point.
Here validation loss is lower than the training one. In this case, it indicates that the validation dataset may be easier for the model to predict than the training dataset. We can increase epoch and run the model again, but it is always better to add batch normalization and check how we can achieve better convergence.

As observed in above graphs, we are arriving at good accuracy and less loss as number of epochs increases. These can be improved by using other optimizers and adding batch normalization layer. 

AdaGrad uses the second moment with no decay to deal with sparse features. RMSProp uses the second moment by with a decay rate to speed up from AdaGrad. Adam uses both first and second moments, and is generally the best choice.

reference : https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

Using Batch Normalization with kernel initialization with relu activation.

By Observing the accuracy and Val_accuracy, there be 
The generalisation gap between the losses are widening. It looks like a case of overfitting. So as to avoid this let's try drop out layer and see how the model converges.

Using Batch Normalization with kernel initialization with relu activation and adding dropout

This is to check if we are getting any increase in accuracy.
The loss and accuracy against epoch gives good insight into network behaviour, it provides changes in learning performance over time in terms of experience.. The lower the loss, the better a model. No large deviation observed between train and validation data. 
The same model is also tried with adagrad optimizer, but there were no significant improvement observed in the accuracy.
The generalization gap is not so much and model convergence is happenin as expected. we could increase number of epoch and see how it improvises. But for now let's evaluate the model for test data and see check the accuracy.
The model is having good accuracy, so let's check the output on test data and compare it with real data

Observation:

It is better to use dense network with 3 or 4 hidden layers. In this experiment, it is observed that the accuracy score is good when he_normal kernal_initialization is used with relu. Adding drop out has also helped to improved network accuracy.
Using Batch Normalization has helped to improve the accuracy. It has standardize the inputs to a network, applied to ether the activations of a prior layer. Batch normalization has accelerated training and provided some regularization, there by reducing generalization error.
Initializing random weights through kernal_initialization has improved the network performance a lot. It is also observed that these techiniques has stabilised the network.
We used the predict_classes on some random images which has given right output, this gives confidence on he network which is built and trained.
We tried different optimizers (SGD, adam) on simple network but it was unable to improve accuracy and very struck at one minima and hence accuracy was also stagnant. We improved the network by adding few layers with some weight initialization which has helped to get better accuracy and upon applying different optimizers it help to get better accuracy. By adding batch normalization and dropout, we got better result.